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I propose an improvement to the implementation of the well established Maximum Entropy 
Method by departing from Bryan's concept 1 1 1 of singular search space. Working with numerical 
data from Lattice QCD |2| 1 found that in cases where prior information is scarce and the number of 
data-points small, extension of the search space can dramatically improve the reconstructed spectra. 
The reason for the inadequacy of Bryan's method and a remedy are discussed based on the shape of 
the basis functions of the search space and results from a mock data analysis. In order to adequately 
approach problems where an exponentially damped Kernel is used, I provide an implementation us- 
ing the C/C++ language that utilizes high precision arithmetic adjustable at run-time |3 1. The LBFGS 
algorithm is included in the code in order to attack problems without the need to resort to a particular 
search space restriction. 

PACS numbers: 



I. INTRODUCTION 

In this paper I would like to discuss an improve- 
ment to the implementation of the Maximum Entropy 
Method (MEM). This algorithm allows to address the 
generally ill-defined problem of determining the inverse 
Laplace Transform from a discrete and noisy set of data- 
points using Bayesian inference. In the present case the 
bilateral Laplace transform is investigated, which con- 
nects the positive definite spectrum p(cu) > with the 
data D (t) via an integral convolution 
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K(t, a))p(uj) 
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In the following I will restrict myself to the exponen- 
tially damped Kernel K(t, cu) — exp[— tcu]. 

In practice the data is obtained by an experimental ap- 
paratus or a numerical simulation and thus is known 
only at discrete points D (ti) — Di and up to a given 
uncertainty given by an error matrix 
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where denotes one of the Nc individual measure- 
ments of the data-point at xt. 

In addition, to carry out the task of determining the 
spectrum from data, we discretize p(uJi) — pi in the in- 
tegral over the frequencies a>i using N^) points between 
an upper and lower cutoff curnax and (V^nm/ i-*^ 

with a 

spacing of Acu = iidsHipiw ^ 

This is the first time we use prior information about 
our system, since tUmax and cumin need to be chosen such 
that all relevant frequencies of the spectrum encoded in 
the data can be accounted for. Such prior information 



can often be derived from sampling theorems in the case 
of an experimental apparatus or the finite size of the im- 
derlying numerical simulation that produces the data- 
points. 

Thus the fully discretized equation we are to sup- 
posed to invert reads 



Di=AwY_ Ku Pi 



(3) 
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It can happen that one is faced with the situation that 
the number of data-points is significantly smaller 
than the number of points one wishes to reconstruct in 
the spectrum. In such cases the inversion of Eq.||3| is 
an inherently ill-defined problem even if perfect data 
were available. Imagine performing a simple fitting, 
i.e. finding a set of points p: that reproduces the data 
Di within the errors cJi — ^/Ca. In such as case many 
degenerate solutions exist, none of which is superior to 
any other. The reason for this is that the finite number 
of data-points can only constrain parts of the spectrum, 
but at this stage we have no way to decide which of the 
reconstructed features in p these correspond to. 

Let me note on another aspect, in that the problem at 
hand is not a linear problem as might be assumed from 
Eq.||3}. The underlying physics required the values of 
p: to be positive definite, which corresponds to an addi- 
tional constraint equation to be met. Therefore even in 
the case that = one cannot directly argue for the 
existence of a unique solution for the inversion of Eq.||3}. 

To address the challenges posed by both finite accu- 
racy and finite resolution in the data, one can introduce 
a regularization for the fitting procedure, which is 
motivated by arguments from Bayesian inference. This 
allows one to understand the above inversion problem 
as a test of hypotheses, in the sense of answering the 
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question: How probable is it that a chosen test-function 
P: is the correct spectral function, given the measured 
data Di and any other prior knowledge available about 
the system. I.e. one rephrases the problem in terms of 
probability distributions (Note that at this point p itself 
does not need to be a probability distribution). 

The ingredients to our problem so far are measured 
data Di and prior knowledge I. The latter is encoded in 
the formalism through a function m[a>], which by defi- 
nition corresponds to the correct spectral function in the 
absence of data. If such prior knowledge is interpreted 
as a sort of additional data available to us, one can con- 
struct from the joint probability function P(p, D, I(m)) 
the following expressions 

P(p, D, I(m)) = P[p|D, I(m)]P[D|I(m)]P[I(m)] (4) 
= P[D|p,I(m)]P[p|I(m)]P[I(m)] (5) 
= P[I(m)lp,D]P[D|p]P[p] (6) 



from which the following relation can be obtained 
P[D|p]P[p|I(m)]P[I(m)|D,p] 



P[p|D,I(m) 



P[DII(m)]P[I(m)|p] 



(7) 



(8) 



In the case that our prior information is exactly known, 
the above term should simplify to the form usually used 
in the MEM 
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[p|D,I(m)] 



P[D|p]P[p|I(m) 
P[D|I(m)] 
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It is thus necessary to find the most probable spectral 
function, i.e a function p^^^M^ which maximizes Eq.||9} 
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-PMEM[p|D,I(ra)] 
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II. STANDARD IMPLEMENTATION 

In order for Eq.||9| to be of use, we need to state the 
explicit form of the factors appearing within. In the fol- 
lowing we neglect the denominator as it does not de- 
pend on the spectral function itself. This leaves us with 
the so called Likelihood probability 

r 1 

P[D|p] = e-^ = exp - - ^ (Dt - DOq^ (Dj - Dp 
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which contains nothing but the usual fitting term and 
a regulator, the so called prior probability. 



P[pil(m)] = e"'^ = exp |^a^ (^pi - mi - Pilog[ 
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S corresponds to the so called Shannon-Janes entropy, 
which assumes the spectrum p to be positive definite. 
Whereas Eq.|(TT| quantifies how well the current test- 
function p reproduces the measured data, Eq.|[l2} tells 
us how well it agrees with our prior knowledge. In the 
optimization task of Eq.fTO) these two terms compete in 
the determination of the extremum of PmemEpID, I{ra)]. 
Since the exponential function of real numbers is a 
monotonous function one usually uses the negative of 
the argument in a numerical implementation, i.e one at- 
tempts to find the minimum of the functional 



Q(p,D,m) = £(D,p) - (xS[m,p) 



(13) 



Three comments are in order. First, one has in- 
troduced an additional parameter into the formalism, 
called a, which however is integrated out self consis- 
tently at the end of the MEM procedure HI MM- Its 
meaning is actually closely related to the confidence 
placed in the prior information, since it weights the im- 
portance of the likelihood against the entropy term. In 
this context it should be understood to quantify how cer- 
tain we are about the information encoded by the prior 
function ra. 

Secondly, the Likelihood probability P[D|p] does not 
depend on any prior information, which is why it is dis- 
tinct from the quantity P[D|p, I] appearing in the usual 
formulation of the MEM [1 , 4-6|. 

Most importantly one should note that there exist a 
proof |6J, which shows that if we supply in addition to 
our measured N^^ data-points additional N(jj points of 
information by introducing the function mi,, the func- 
tional Q(p,D,m) does possess a unique extremum in 
the N uj dimensional space of functions pi, if such a min- 
imum exists. At this true global extremum, the likeli- 
hood C will be of comparable size to the entropy term 
oiS, all of them being of order 0(1 — 10)^. 

Conversely, if we use a restricted search space, e.g. 
in Bryan's approach, we have to make sure that the ob- 
tained extremum is actually a global extremum and not 
just a local extremum in the chosen subspace. 

Taken together the above statements tell us that by 
supplementing the measured data Di by some form 
of additional data-points, called prior information mi, 
the standard fitting procedure is regularized to give 
a unique solution. This solution is characterized by 
having maximum entropy in the sense of maximizing 
Eq.||T2| and minimizing Eq.fTT). 

In regularizing the usual )pfitting we are now able to 
extract in a meaningful way those parts of the spectrum 
that are actually encoded by the given data. I.e. wher- 
ever the data is not capable of constraining the spec- 
trum, our choice of prior function will select the corre- 



(12) 



^ If in the numerical implementation the most probable spectrum still 
remains at values of C larger than ~ 1 00 the discretization in fre- 
quency space in chosen too coarse. 
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spending unique maximum entropy spectrum. Chang- 
ing the prior function will thus lead to different solu- 
tions pi, which differ only in those parts not constrained 
by the data. Invariant features in the spectrum under 
variation of mi can therefore be attributed to an actual 
signal encoded in Dt. 

In order to perform actual computations let us repeat 
the train of thought that leads to the current state of the 
art implementation of the MEM based on Bryan's singu- 
lar search space. 



Bryan's Search Space 

Through the stationary condition 

= 

one is lead to the expression 



6Q 



-alog 



Pi J 



dDf 
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The fraction in the logarithm invites us to make the pos- 
itive definiteness of the spectrum and the prior explicit 
by using the general parametrization pi = raiexp[Qi], 
which gives in vector notation 



' dDP' 



(16) 



Bryan's proposal refers to the introduction of the Singu- 
lar Value Decomposition (SVD) of the transposed kernel 
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Note that the matrix U contains a full orthonormal ba- 
sis of the M'^ . L on the other hand is a diagonal ma- 
trix, which contains only entries different from zero. 
The above (implicit) expression leads Bryan to the incor- 
rect (as will be shown in the next section) assumption 
that the vector a characterizing the global extremum al- 
ways has to lie in the subspace spanned by the first N t: 
columns of the matrix U. Thus his spectral function is 
parametrized using the N^^ values bj 



(18) 



III. INADEQUACY OF THE SVD SEARCH SPACE 

In order to illustrate that restricting one's test func- 
tions to Bryan's singular subspace can lead to an incor- 
rect result for the extremum of Eg. (IS) and that extend- 
ing the search space can remedy such artifacts, I perform 
a mock data analysis using the code given in ||3J. 



Mock data analysis 

The exponentially damped Kernel 
K(t, cu) — exp[— TO)] 



(19) 



is used to reconstruct an a priori known function p™'^'^. 
A discrete mock spectrum is calculated from an analytic 
expression containing the sum of four Gaussian peaks 
with parameters as shown in Tab IT] From it a set of data- 
points is generated via Eq. ||3| and after distorting these 
through Gaussian noise they are fed to the MEM code as 
input measurements. We use NJ^°* — 5000 to construct 





1st peak 


2nd peak 


3rd peak 


4th peak 


amplitude: 

position: 

width: 


-2.3 
0.1 


0.6 
0.52 
0.1 


0.25 
2.6 
0.4 


0.2 
7.5 
1.4 



TABLE I: Parameters of the Gaussian peaks used in the mock 
function pmock inspired by Lattice QCD data used in |2l 



ideal data D"*'^*', to which we add Gaussian noise at each 
individual Tt with variance 

5Qmock strength of the 
disturbance is controlled by the parameter r|: 



5pmock ^ .^pideal^ JgH,. 

We choose as prior the function 
mftul 



(20) 



cu + CUo 



even though the original spectral function did not in fact 
contain such a behavior Note that by using this function 
along the full frequency interval we explicitly acknowl- 
edge that wrong prior information is supplied. To facil- 
itate the reconstruction the prior function is normalized 
such that the area under it coincides with the integral 
over the mock spectral function. 

As discretization for the reconstruction we choose the 
frequency range w E [—10,20] divided into N^j = 1500 
points, whereas t e [0, 6.1] with IM^^ = 12. To capture the 
large dynamic range of the kernel, resulting from the in- 
clusion of negative frequencies, the internal arithmetic 
is set to use 384 bits of precision. We wish to separate 
the question of successful reconstruction from the qual- 
ity of data, hence a small noise r| — 0.0001 is used to only 
slightly distort the ideal mock data. 

When performing the MEM reconstruction of the 
mock spectrum based on Bryan's search space, we find 
the following problem 

• The spectral features are only poorly reproduced. 
Both at positive and negative frequencies peaks 
are missing or washed out. (see top left panel in 
Fig(l]andFig|2) 

• Even the mock data itself is not reproduced within 
its error-bars, i.e. the final value of Q ~ 10^ (see 
top left panel in FigjSjl 
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FIG. 2: The negative frequency part of the reconstructions of the mock function Pmock for different values of Next 
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FIG. 3: Comparison of the data D™°* generated from the mock function Pmock and the data corresponding to the MEM recon- 
struction D"^. The values for Q are taken at the value of a at which P[a|D, I(rn,)] takes its maximum. 



We can understand this failure if we look at the available 
basis functions in the top left panel of Figj4] Based on 
Eq.([T7|, the SVD search space is solely determined by 
the number of data-points and the values for cumax 
and cUjnin- Hence we are left with 12 basis functions in 
the present case. 



For our choice of cOmin — —^0, Fig|4] shows that the 
available basis functions oscillate mostly in the nega- 
tive frequency range and almost no structure remains 
at cu > 0. It is therefore very difficult to reproduce 
any sharply peaked spectral structure in the positive do- 
main. Note that the mock spectrum includes peaks that 
are located widely apart, which prevents us from reme- 
dying the problem by just shifting the overall frequency 
interval towards positive frequencies. 

Bryan's prescription to select the SVD search space 
in general does not make any reference to our choice 
of cUmin- I.e. we are allowed to set its value arbitrar- 
ily large and negative. Since his basis functions are ob- 
tained from the first N^: columns of the SVD matrix U, 
their oscillating behavior always starts at the most nega- 
tive frequency as shown in Figls] Hence it is always pos- 
sible to make the MEM fail within Bryan's search space 
by just extending the frequency interval to larger nega- 
tive values of cu. 



^ Keeping Aw in the discretization fixed while sending w^i^ — > — oo 
guarantees that in the full search space the correct extremum can 
still be found. 



IV. EXTENSION OF THE SEARCH SPACE 

The reason for the popularity of Bryan's approach is 
that it apparently offers a dramatic decrease in computa- 
tional cost from No, to N degrees of freedom. However 
the proof on the existence and uniqueness of an MEM 
solution in [6 1 applies only to the full M'^ search space. 
In addition, as we have seen above, the reconstruction in 
the SVD subspace can always be made to fail by choos- 
ing CUmin large and negative. 

Therefore I propose to systematically enlarge the 
search space starting from Bryan's SVD subspace with 
the prospect of locating the correct global extremum 
of the functional Q(p, D, m) already with a number of 
Next < No) basis functions. 

As an arbitrary choice, I decide to extend the search 
space by including more and more of the columns of the 
matrix U in the parametrization of the spectrum, so that 
now 



(21) 



k=1 



with < Next < ^cu- 

The number of basis vectors required to adequately 
determine the global extremum can then be determined 
by increasing the number Next until the minimal value 
of Q(p, D, ra) does not decrease when adding an addi- 
tional basis function. In the worst case this process has 
to be continued until Ngxt = '^w since only the full set 
of columns of U encodes a complete set of basis vectors 
for the M"^-'. 
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FIG. 4: Basis vectors of the search space for different values of Next 
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FIG. 5: The basis vectors of Bryan's search space for different uJmm but constant resolution Atu = 0.02. 



Mock data analysis with extended search space 

The improvement in the reconstruction of the mock 
spectrum of section|lIl]af ter increasing the number of ba- 
sis vectors is clearly seen in the various panels of Fig IT] 
and Figj2] With Bryan's approach (in the top left panel)7 
we are not able to capture the cu < peak at all and the 
positive frequency peaks are severely washed out. In- 
creasing the number of basis vectors (their number be- 
ing denoted by the subscript in pNexi) the reconstructed 
spectra improve even though no additional data has 
been included. Nevertheless in the present case of only 
= 12 data-points, we find that only the lowest ly- 
ing peak on the positive frequency side is reconstructed 
in a stable manner. All higher lying structures show a 
significant amoimt of variation when changing between 
different basis sets. 

A more quantitative indicator for improvement of the 
method is the reduction of the value of the most prob- 
able Q in the extended space compared to Bryan's pre- 
scription. The most probable spectral function is 
obtained at the global extremum of the functional Q, 
which as we can see in FigjSjclearly lies outside the SVD 



space. 

Small values of Q also correspond to a successful re- 
production of the supplied data-set by D f . If we take a 
look at the result using Bryan's approach in FigjS] (the 
left panel of the first row), we can see that the data- 
points between 4.5 < t < 6 do not match within the 
small error bars. This leads to a large value of Q where 
C ^ S. Only when we go to the extended search spaces 
are C and thus Q able to decrease, such that their value 
actually stabilizes around the same order of magnitude 
as S. 



V. CONCLUSION 

The Maximum Entropy method offers a solution to 
the question of how to bring meaning to the ill-defined 
problem ||3} of inferring the values pi from a noisy and 
finite data-set Di. Instead of maximizing only the like- 
lihood probability with a test spectral function pi, one 
regularizes the process by including the prior probabil- 
ity The function p^^M represents the extremum of 
1 10 1 is hence the most probable answer in the Bayesian 
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sense. 

Since the choice of the SVD basis functions does not 
depend on the choice of cUmin and their number is fixed 
by the supplied number of data-points, I argued that 
Bryan's search space does not in general contain the cor- 
rect global extremum of the functional Q(p, D, m). Nu- 
merical evidence was presented to support this conclu- 
sion. I thus propose to systematically expand the search 
space to < Next < No) dimensions until the correct 
global extremum of the functional has been found. 

Introducing a large number of basis functions in- 
evitably leads to the appearance of "wiggly" structures 
in the reconstructed spectral function p'^^'^(cu). Since 
they are not constrained by the data, such artifacts can 
be identified through a variation of the prior function. 
In turn, those features of p^^^(cu) that are reliably en- 
coded in the data do not suffer from such variations of 
rai. 



A. Recipe for a consistent MEM 

1. Measure the data points with as small error as pos- 
sible. 

2. Construct a set of prior functions that vary signifi- 
cantly in the region where one expects the data to 
encode a spectral signal. 

3. Search for the global maximum of P[p|DTa]: 
Start with Bryan's search space and increase the 
number of basis vectors until the extremal value 
of Q(p) does not change appreciably when adding 
an additional basis vector. 

4. For the spectral structures obtained in this way, 
carry out the determination of the intrinsic error 
according to |6|. 

5. Repeat from step 3 after changing to another prior 
function of the set chosen in step 2. 

6. By identifying those spectral features that remain 
unchanged under the variation of the prior and 
that show small errors from step 4., locate the 
structures that are truly constrained by the data. 
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Appendix A: Efficient marginalization of a 

In this appendix I would like to mention a techni- 
cal detail regarding the implementation of the proce- 



dure to marginalize the artificial parameter a inserted 
m Eq.|[l2). To this end one calculates the maximum p" 
of Q(p, D, m, a) for many different values of a and then 
self consistently averages the results [l] [SHZl using the 
following relation 



pM™(a)) 



Vp dap(a))P[p|D, I(m), a]P[a|D, I(m)] 

(Al) 

dap«(cu)P[alD,I(m)] (A2) 



The explicit expression of P[a|D, I(m)] has been shown 
to be 



P[a|D,I{Ta)] cx 



exp 



Q[D,p«,I(m]] + - Y_ log 



k=0 



(A3) 
(A4) 



where are the N^^ non-zero eigenvalues of the matrix 



A" 



5pi6pj 



(A5) 



Let us see why this symmetric N uj x N uj matrix only 
contains such a small number of nonzero eigenvalues. 
Using the Singular Value Decomposition of = U£V* 
( U is the No, X Nt: sized matrix consisting of the first 
columns of the matrix U in Eq. and £ and V 
the corresponding matrices of size N^^ x N^^) we can 
rewrite (\/p" denotes the vector obtained after applying 
the square root to each individual component p") 



A" 



6DP5DP 



VlU^yp^. (A6) 



With the additional definition of the two symmetric 
IMt X IMt matrices 



8^C 



6DP6DP 



VI 



9 = 9" 



T = UMiag[p]U 



(A7) 
(A8) 



and an application of Sylvesters determinant theorem 
we can rewrite the Eigenvalue equation for the matrix A 
as 



= det 
= det 

= det 



8^C 



/p^UlV 



(A9) 



5DP5DP 



VIU*yp^-AkI 



(AlO) 



IV^ 



8^C 



5DP5DP 
= det MT-AklN,xN, 



VIU^^V^U-AklN.xN.J 

(All) 
(A12) 
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Here it is important to realize that the product of two 
symmetric matrices is not necessarily symmetric, i.e. 
MT 7^ TM, so that algorithms for Hermitian matrices 
cannot be used. Of course, since the spectrum of the 
original matrix A is real, the matrix MT does not harbor 
any complex eigenvalues. In addition we see that the 
matrix A contains two factors of the Kernel K, which, in 
the case of a large dynamical range in K, requires arith- 
metic of around twice the precision compared to the rest 
of the procedure to yield correct values. 



The reader should also be aware that the above ma- 
nipulations are independent from our choice of search 
space. The matrix A always has N.^ non-zero eigenval- 
ues even if we choose a search space with a different 
dimensionality. Furthermore we note that if one arti- 
ficially restricts the SVD space by taking only a num- 
ber of singular values larger than a certain small value 
into account, one has to ensure that the corresponding 
eigenvalues of the Matrix A that one discards are small 
in comparison to the values a. 
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